Computational Text Analysis for Social Science: Model Assumptions and Complexity
نویسندگان
چکیده
Across many disciplines, interest is increasing in the use of computational text analysis in the service of social science questions. We survey the spectrum of current methods, which lie on two dimensions: (1) computational and statistical model complexity; and (2) domain assumptions. This comparative perspective suggests directions of research to better align new methods with the goals of social scientists. 1 Use cases for computational text analysis in the social sciences The use of computational methods to explore research questions in the social sciences and humanities has boomed over the past several years, as the volume of data capturing human communication (including text, audio, video, etc.) has risen to match the ambitious goal of understanding the behaviors of people and society [1]. Automated content analysis of text, which draws on techniques developed in natural language processing, information retrieval, text mining, and machine learning, should be properly understood as a class of quantitative social science methodologies. Employed techniques range from simple analysis of comparative word frequencies to more complex hierarchical admixture models. As this nascent field grows, it is important to clearly present and characterize the assumptions of techniques currently in use, so that new practitioners can be better informed as to the range of available models. To illustrate the breadth of current applications, we list a sampling of substantive questions and studies that have developed or applied computational text analysis to address them. • Political Science: How do U.S. Senate speeches reflect agendas and attention? How are Senate institutions changing [27]? What are the agendas expressed in Senators’ press releases [28]? Do U.S. Supreme Court oral arguments predict justices’ voting behavior [29]? Does social media reflect public political opinion, or forecast elections [12, 30]? What determines international conflict and cooperation [31, 32, 33]? How much did racial attitudes affect voting in the 2008 U.S. presidential election [34]? • Economics: How does sentiment in the media affect the stock market [2, 3]? Does sentiment in social media associate with stocks [4, 5, 6]? Do a company’s SEC filings predict aspects of stock performance [7, 8]? What determines a customer’s trust in an online merchant [9]? How can we measure macroeconomic variables with search queries and social media text [10, 11, 12]? How can we forecast consumer demand for movies [13, 14]? • Psychology: How does a person’s mental and affective state manifest in their language [15]? Are diurnal and seasonal mood cycles cross-cultural [16]? • Scientometrics/Bibliometrics: What are influential topics within a scientific community? What determines a paper’s citations [35, 36, 37, 38]?
منابع مشابه
Emotion Detection in Persian Text; A Machine Learning Model
This study aimed to develop a computational model for recognition of emotion in Persian text as a supervised machine learning problem. We considered Pluthchik emotion model as supervised learning criteria and Support Vector Machine (SVM) as baseline classifier. We also used NRC lexicon and contextual features as training data and components of the model. One hundred selected texts including pol...
متن کاملCollecting Legacy Corpora from Social Science Research for Text Mining Evaluation
In this poster we describe a pilot study of searching social science literature for legacy corpora to evaluate text mining algorithms. The new emerging field of computational social science demands large amount of social science data to train and evaluate computational models. We argue that the legacy corpora that were annotated by social science researchers through traditional Qualitative Data...
متن کاملComplexity Assumptions in Ontology Verbalisation
We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then ...
متن کاملUsing Critical Discourse Analysis Based Instruction to Improve EFL Learners’ Writing Complexity, Accuracy and Fluency
The literature of ELT is perhaps overwhelmed by attempts to enhance learners’ writing through the application of different methodologies. One such methodology is critical discourse analysis which is founded upon stressing not only the decoding of the propositional meaning of a text but also its ideological assumptions. Accordingly, this study was an attempt to investigate the impact of critical...
متن کاملComputer-Assisted Text Analysis for Social Science: Topic Models and Beyond
Topic models are a family of statistical-based algorithms to summarize, explore and index large collections of text documents. After a decade of research led by computer scientists, topic models have spread to social science as a new generation of data-driven social scientists have searched for tools to explore large collections of unstructured text. Recently, social scientists have contributed...
متن کامل